Here we test whether SF3B1 changes its affinity to defined binding sites in the mutant condition (compared to the wt). This is done by using the DESeq2 NB model with the LRT (likelihood ratio test), to compare changes in binding sites to changes in the respective hosting gene. Essentially, we account for RNA abundance changes by approximating the transcript expression level by all iCLIP counts that do not end up in a binding site. We call this background counts. The DESeq2 model essentially uses these background counts to find binding sites that change independently from the underlying transcript level change, thus disentangling both signals.
First binding sites are overlapped with classified peak regions from the clustering approach. This results in binding sites overlapping one of the four categories (DoubleWide, DoubleNarrow, Single, Rest). The DoubleWide peak class is further split in the left and right side. Each side can overlap with multiple binding sites, resulting in muliple LFCs, P-values, ect for each side. To resolve the issue values from the binding site with the lowest P value were taken as representative.
Show code
# load peak classification from clusteringpeakClass = rtracklayer::import.bed("../02_peakClassification/data/rngClassified.bed")peakClass$group =sapply(strsplit(peakClass$name,"_"), `[`, 1)# group peaks by classificationpeakList =split(peakClass, peakClass$group)# bsRes = searchRes$objbsRes =getRanges(bds.diff)# get binding site LFCs, P-values, ect for binding sites in peak regionsolBs =subsetByOverlaps(bsRes,peakList$DoubleNarrow)df1 =data.frame(BsID = olBs$bsID, GeneID = olBs$geneID, lfc = olBs$bs.log2FoldChange, padj = olBs$bs.padj, peakType ="DoubleNarrow")olBs =subsetByOverlaps(bsRes,peakList$DoubleWide)df2 =data.frame(BsID = olBs$bsID, GeneID = olBs$geneID, lfc = olBs$bs.log2FoldChange, padj = olBs$bs.padj, peakType ="DoubleWide")olBs =subsetByOverlaps(bsRes,peakList$Rest)df3 =data.frame(BsID = olBs$bsID, GeneID = olBs$geneID, lfc = olBs$bs.log2FoldChange, padj = olBs$bs.padj, peakType ="Rest")olBs =subsetByOverlaps(bsRes,peakList$SinglePeak)df4 =data.frame(BsID = olBs$bsID, GeneID = olBs$geneID, lfc = olBs$bs.log2FoldChange, padj = olBs$bs.padj, peakType ="SinglePeak")# split double-wide peaks in left and right side# -> based on midpoint# -> left/ right switches with the stranddoublePeaks = peakList$DoubleWidedoublePeaks$doublePeakID = doublePeaks$namedoublePeaksP = doublePeaks[strand(doublePeaks) =="+"]doublePeaksM = doublePeaks[strand(doublePeaks) =="-"]doublePeaksP =as(slidingWindows(x = doublePeaksP, width =41, step =41), "GRangesList")doublePeaksM =as(slidingWindows(x = doublePeaksM, width =41, step =41), "GRangesList")doublePart1P =as(lapply(doublePeaksP, function(x){x[1]}),"GRangesList") %>%unlist()doublePart2P =as(lapply(doublePeaksP, function(x){x[2]}),"GRangesList") %>%unlist()doublePart1M =as(lapply(doublePeaksM, function(x){x[2]}),"GRangesList") %>%unlist()doublePart2M =as(lapply(doublePeaksM, function(x){x[1]}),"GRangesList") %>%unlist()doublePart1 =c(doublePart1P, doublePart1M)mcols(doublePart1)$doublePeakID = doublePeaks$nameexport(doublePart1, con ="./data/LeftPartFar.bed", format ="BED")doublePart2 =c(doublePart2P, doublePart2M)mcols(doublePart2)$doublePeakID = doublePeaks$nameexport(doublePart2, con ="./data/RightPartClose.bed", format ="BED")olBs =subsetByOverlaps(bsRes, doublePart1)df5 =data.frame(BsID = olBs$bsID, GeneID = olBs$geneID, lfc = olBs$bs.log2FoldChange, padj = olBs$bs.padj, peakType ="DoubleWide-Left")olBs =subsetByOverlaps(bsRes, doublePart2)df6 =data.frame(BsID = olBs$bsID, GeneID = olBs$geneID, lfc = olBs$bs.log2FoldChange, padj = olBs$bs.padj, peakType ="DoubleWide-Right")